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Community detection is one of the most studied problems on complex networks. Although hun¬ 
dreds of methods have been proposed so far, there is still no universally accepted formal definition 
of what is a good community. As a consequence, the problem of the evaluation and the comparison 
of the quality of the solutions produced by these algorithms is still an open question, despite con¬ 
stant progress on the topic. In this article, we investigate how using a multi-criteria evaluation can 
solve some of the existing problems of community evaluation, in particular the question of multiple 
equally-relevant solutions of different granularity. After exploring several approaches, we introduce 
a new quality function, called MDensity, and propose a method that can be related both to a widely 
used community detection metric, the Modularity, and to the Precision/Recall approach, ubiquitous 
in information retrieval. 


I. INTRODUCTION 

Community detection is one of the most studied prob¬ 
lems on complex networks. Countless papers have been 
published on this topic, in particular during the last 15 
years. However, most of this publication frenzy has been 
centered on the problem of proposing new methods. 

As the number of proposed methods increased, the 
problem of comparing them became more and more im¬ 
portant. As a consequence, some of the most influential 
works on the subject propose methods to compare par¬ 
titions between themselves. As the number of scalable 
methods increases, the question of which method to use 
becomes more and more important. 

In this paper, we will first review some of the exist¬ 
ing methods to evaluate the quality of a partition. In a 
second section, we will discuss on what are good commu¬ 
nities and good partitions, and will argue for the inter¬ 
est of using a two-criteria approach. Finally, we will go 
step by step from a two-criteria method directly inspired 
from information retrieval techniques to a more relevant 
method grounded on the Modularity. 

II. EVALUATION METHODS 

Several techniques have already been proposed to com¬ 
pare partitions -and therefore the algorithms that pro¬ 
duced them- between themselves. We can classify these 
methods in three families: 

• Single score metrics 

• Evaluation on generated networks 

• Evaluation on real networks with ground truth 

We will review in the next sections these three types 
of evaluation methods 
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A. Single score metrics 

Single score metrics are using quality functions associ¬ 
ating a score to a given partition of a given network. To 
compare two partitions on a same network, one can sim¬ 
ply compare their scores according to this quality func¬ 
tion. Historically, the ancestor of community detection is 
the problem of graph partitioning -finding sets of nodes 
of predefined sizes such as the number of edges between 
them is minimum. In this problem, the quality function 
was simply the number of edges between communities. 
As this metric loses its significance as soon as we do not 
fix the size and number of the clusters to And, different 
metrics are used to evaluate the quality of community 
partitions. Several of these metrics are detailed in [T], 
while [2] compares some of them on real cases. However, 
the quality function that is by far the most popular is 
the Modularity. Initially introduced in [3], this quality 
function is defined as the difference between the ratio of 
edges that fall inside communities and the expected value 
in a randomized version of the network. We will talk in 
details of the Modularity in the later sections of this arti¬ 
cle. It is now so popular that it is sometimes considered 
and used as a definition of communities. However, since 
the demonstration of its resolution limit [i] , the usage of 
the sole Modularity to evaluate communities is discour¬ 
aged. Adaptations of the Modularity called resolution- 
free methods have been proposed, notably in [SI [S] , but 
have also been criticized. Surprise |7] is another interest¬ 
ing function proposed recently, measuring how unlikely 
is a given partition compared to a null model. 

Using a single score metric to evaluate communities 
has advantages: 

• One can compare partitions not only on networks 
with a known solution but on any particular net¬ 
work of interest for him. 

• For any network, a best solution can be unambigu¬ 
ously identified. 

But also some drawbacks: 
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• Using a quality function is using a fix definition of 
what is a community. However, this definition is 
arbitrary, as there is no consensus of the topic. 

• On real networks, quality functions are likely to 
have only one maximal value. However, it is known 
that networks can have several meaningful levels of 
decomposition. Among several potential problems 
raised by this observation, a perfect partition at a 
suboptimal level might get a lower score than an 
imperfect decomposition at the optimal level. 


B. Evaluation on generated networks 

This approach consists in generating random networks 
with a well-defined community structure, known by con¬ 
struction, running several algorithms on these networks, 
and check how well the partitions found match the ex¬ 
pected ones. From ad-hoc methods such as the one used 
in [3] , more advanced network generators have been pro¬ 
posed. The LFR benchmark [8] is the most widely used, 
it allows to tune several parameters such as the num¬ 
ber of nodes, average degrees, distribution of the size of 
communities, and so on and so forth. To compare the so¬ 
lutions found to the reference, the most widely used func¬ 
tion is the Normalized Mutual Information [5], but other 
approaches are possible, such as topological approaches 
[To] , or cluster-analysis methods modified to take into 
account the network structure [Ill- 

Advantages: 

• Although proposing networks with a community 
structure requires a loose definition of what is a 
community, consensual results are easier to reach 
than in a quality function, where the definition is 
formal. It is easier to recognize a good community 
when we saw one that to give a universal definition. 

• Variations on usual communities, such as hierar¬ 
chical decomposition, fuzzy communities or over¬ 
lapping communities can be tested. 

Drawbacks: 

• Nothing guaranties that the networks and commu¬ 
nities generated are realistic. This means that some 
algorithms might be more (or less) efficient on these 
generated networks than on real ones. 

• This category of evaluation aim at finding a univer¬ 
sally most efficient algorithm. By varying the pa¬ 
rameters of the network, it might be possible to re¬ 
fine this classification, some algorithms being more 
efficient for dense networks, or small communities 
for instance, but the algorithms efhciencies are not 
evaluated on the particular graph that one wants 
to study. 


C. Evaluation on real networks with ground truth 

One solution to the problem of unrealistic networks 
and community structures of generated networks is to 
work with real networks and real communities. This was 
the idea of the first evaluations, using small networks 
such as the Zachary karate club [T^ or Lusseaus dol¬ 
phins network m, on which the communities found can 
be compared to a known real decomposition, or be stud¬ 
ied graphically or intuitively. However, transposing this 
method to networks with a larger scale wasn’t possible 
for a long period. Recently, some adequate networks were 
proposed, such as in [21 [14] . In [15] , a slightly different 
approach is proposed: instead of comparing partitions to 
a priori ground truth, experts assign relative and abso¬ 
lute scores to several solutions on a same network. 

However, these methods also have weaknesses, as dis¬ 
cussed in [T6| for instance. The problem is that this 
approach compares solutions that are purely topological 
with ground truths that depend on much more factors. It 
is for instance possible to have a ground truth community 
composed of several connected components, a situation 
that does not make sense on a topological perspective. 

Other advantages and drawbacks are similar to those 
of the method using generated networks. To put it in a 
nutshell, whereas the network properties are no longer a 
concern, the solution of reference becomes less reliable, 
and this approach can only be used to pick an universally 
best performing method. 


D. Potential advantages of a multiple-criteria 
quality function 

As we have seen in the previous section, using a qual¬ 
ity function to evaluate community structures has the 
important advantage of evaluating partitions on a given 
network of interest, instead of searching for an universally 
superior algorithm. In the most common application of 
community detection, one wants to study a given net¬ 
work, knows several methods applicable to it, but does 
not know which one to use. The quality function can 
tell him which algorithm is the most efficient on his net¬ 
work. However, we know that with a single metric, the 
choice will be arbitrary, because it is possible on net¬ 
works to have several relevant community structures of 
different granularities, and one of these solutions will, in 
the general case, have a higher quality score than the 
others. By using a multiple-criteria function and the 
notion of Pareto optimality, we will be able to identify 
several potentially relevant solutions, defined as all so¬ 
lutions present on the Pareto frontier. The property of 
these solutions is that no other solution is superior to 
them in all considered criteria. The preference for one 
of these solutions relatively to others can still be decided 
by attributing weights to the considered criteria. In the 
following sections, we will propose some possible relevant 
criteria. 
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FIG. 1: Visualization of Precision and Recall for several algorithms 


III. A FIRST APPROACH: COMMUNITY 
DETECTION AS A CLASSIFICATION PROBLEM 

To propose a new method for community evaluation, 
we need to go back to the definition of what is a good 
community. The consensual definition, widely accepted, 
is that a good community structure must identify groups 
of nodes that are well connected between themselves, 
while having few edges between the groups. 

The key idea that we will explore in this paper is that 
communities are defined as a trade-off between two ob¬ 
jectives, having communities as dense as possible, and as 
well separated as possible from the rest of the network. 

Creating well-separated communities, without other 
constraint, is simple. By increasing the size of commu¬ 
nities, we can add as many edges as we want in them. 
The optimal solution for the separation objective is to 
define communities as the connected components of the 
network, resulting in the absence of edges between com¬ 
munities. 

In a similar fashion, without other constraint, it is 
straightforward to define communities as dense as we 
want. The optimal solution for this objective, reach¬ 
able in any network, consists in communities composed 
of cliques, of size 2 in the worst case, plus a certain num¬ 
ber of singletons. All non-singleton communities have a 
density of I. 

Between these two extreme but uninformative solu¬ 
tions lie the sought ones. 

This idea was already present in the graph-partitioning 
problem, which consist in optimizing the separation while 
fixing the size of communities. As the intern density 
depends on the number of intern edges and the size of 
the communities, this has the consequence of fixing a 
lower bound on the value of the intern density. 

This idea is again central in the common definitions of 
communities, such has the conductance HU, defined as 


the ratio of extern edges over intern edges, or the modu¬ 
larity, in a more indirect fashion. 

However, when comparing partitions, these two oppo¬ 
site objectives are usually merged in a single metric, to 
determine which solution is better than all the others. As 
discussed earlier, this is often not pertinent in the case 
of community detection, as several meaningful levels of 
decomposition might exist. 

The problem is to decide which metrics can we use to 
meaningfully represent the separation and the definition. 
In the coming chapter, we will propose a first simple ap¬ 
proach, and discuss its strengths and weaknesses. 


A. Precision and Recall 

One of the most common uses of a twofold metric to 
evaluate the result of an algorithm can be found in the 
classification problem, through the usage of Precision and 
Recall. The first approach we propose is simply to con¬ 
sider the problem of community detection as a classihca- 
tion problem, allowing us to evaluate it as such. 

We can define the problem as follows: for an undi¬ 
rected, not oriented graph G = {V,E), the instances to 
classify are all the vertex pairs of the network, VP = 
■ i e V,j G V,i yf j}. 

The two categories are: 

EDGE : {vp GVP :vpG E} 

NOTEDGE : {vp &VP :vpiE} 

A community detection algorithm is therefore seen as a 
classifier to recognize vertex pairs that are the more likely 
to belong to E. This definition makes sense relatively to 
the definition of community detection: communities must 
be dense, and few edges must fall between communities, 
a community detection tries to leave as less edges outside 
of the communities, while having as less non-edges inside 
communities as possible. 
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To keep close with the classihcation problem, we split 
in a training set and a validation set. In our experi¬ 
ments, we create a graph Gtraining, corresponding to an 
original graph G from which we remove a percentage P 
of its edges. 

These removed edges RE constitute part of the val¬ 
idation set VS. However, the validation set must be 
representative of the training set used by the algorithm, 
which was composed of vertex pairs, not only of edges 
of G. Therefore, our validation set must be composed 
of both. The vertex pairs not belonging to E are essen¬ 
tial to compute the rate of false positives, edges that fall 
inside communities but do not belong to G. The total 
number of vertex pairs in the validation set is equal to 
Ins'! = P* \yP\, and is composed of {RE\JFPT}, with 
FPT a random selection of potential edges taken from 
{VP \ P}, with size \FPT\ = - \RE\ 

Precision and Recall are computed according to their 
usual definition. As a reminder, they are defined as: 
Precision = 

Recoil — 

with TP = True Positive, FP = False Positive and FN = 
False Negative 


TP 
TP+FP 
TP 


C : set of all communities 
n : number of nodes in a graph 
m : number of edges in a graph 
Ic : number of edges inside community c 
I : number of edges inside all communities 
: number of edges inside community c according 
to a null model 

I' : number of edges inside all communities according 
to a null model 

kr : number of vertex pairs inside community c 
k : number of vertex pairs inside all communities 
p : number of vertex pairs in the whole graph 


TABLE I: Notations 


1. Network interpretation of Precision and Recall 

In the context of community detection, Precision and 
Recall can be written as a more traditional, network- 
centered approach. To simplify, we will not make a dif¬ 
ference between the test set and the whole network. On 
the whole network, the Recall corresponds to the fraction 
of all edges that fall inside communities. This value is of¬ 
ten used in evaluating community detection, for instance 
in the Modularity, defined as the difference between this 
observed ratio and the expected ratio in a null model. 

Precision corresponds to the ratio of edges inside com¬ 
munities over the number of vertex pairs inside communi¬ 
ties, i.e the global density of communities. More formally, 


on the whole network: 

Precision = Recall = — 
k m 

To keep the equations simple, we will use the notations 
of Table H] for all equations relative to communities. 


2. Precison and Recall to evaluate communities 

Fig.g is an illustration of the results with five widely 
used algorithms, on a real graph and a generated graph 
produced by using the LFR benchmark [8] . The networks 
used are: 

• LFR, p = 0.5: a network generated with the LFR 
benchmark, with standard parameters and a mix¬ 
ing parameter p — 0.5. Communities are still well 
defined, but most algorithms already fail to identify 
them, n = 5000, m = 50000 

• DicoSyn [TH][in]: a synonymy network, nodes rep¬ 
resent verbs and edges proximity in meaning. We 
chose this network because it is easily interpretable, 
and one can observe by comparing different algo¬ 
rithms that partitions with high modularity seem 
less relevant than some solutions of lower modular¬ 
ity. n = 9146, m = 51423 

The methods used are: 

• eigen: The eigenvector decomposition method de¬ 
scribed in [20) . as implemented in the igraph pack¬ 
age [2T] . 

• fg: The fastgreedy modularity optimization 

method, described in [52], as implemented in the 
igraph package. It provides a hierarchical decom¬ 
position. 

• wt: The walktrap algorithm, described in [23], as 
implemented in the igraph package. 

• im: The infohiermap method, described in [24|, as 
implemented by the authors. Can identify several 
levels. 

• louvain: The louvain method, described in [25] , as 
implemented by the authors. Can identify several 
levels. 

For each network, we generated 20 different test sets, 
and run each algorithm on each of these test sets. Indi¬ 
vidual results are displayed as a small dot, and a large dot 
corresponds to the average values. For methods produc¬ 
ing several solutions, we consider only the default one. 

On the generated graph, we can observe that only two 
methods, infomap and louvain, are on the Pareto frontier, 
the other ones being outperformed on both aspects at 
least by infomap. The solution proposed by the louvain 
method has a higher Recall than the infomap method, 
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but a lower precision. The correct decomposition is also 
on the Pareto frontier. 

On the real graph DicoSyn, it seems more difficult 
for a solution to outperform most others. Instead, each 
method is superior to other ones on one aspect, but in¬ 
ferior in another one. The eigen method is the only one 
to be outperformed. 

These examples illustrate the interest of a multi¬ 
objective approach: 

• It is possible to compare algorithms on a particular 
graph, and not only on test graphs 

• It is possible to eliminate some methods as Pareto 
dominated 


to do so, and we could compute the values of Precision 
and Recall on the whole network. For the other solu¬ 
tions presented on this paper, we will directly compute 
solutions on the whole network, and compute our criteria 
accordingly. However, we can note that the idea of evalu¬ 
ating several runs on a slightly randomized version of the 
network can nevertheless be interesting, because of the 
known problem of the instability of the algorithms [5^: 
a same algorithm, when confronted to two slightly dif¬ 
ferent versions of a same network, can converge to very 
different partitions. 


IV. SIZE, SEPARATION AND MODULARITY 


• It is also possible to keep several partitions as po¬ 
tential solutions, and to choose the most interesting 
one depending on our objective 


3. Limits of the Precision and Recall method 

By using the typical definition of Precision and Recall, 
we have a first solution for the evaluation of the quality of 
partitions. However, this method has clearly some weak¬ 
nesses. First, community detection is not usually defined 
as a classification problem, and there is no guaranty for 
these metrics to correctly recognize good communities. 

The second weakness is that we can easily add a large 
number of partitions on the Pareto frontier, even though 
these partitions are clearly not relevant in term of com¬ 
munities. We can show that, starting from a Pareto 
optimal partition P, we can in most cases generate a 
new valid partition P' with either a higher Precision or 
a higher Recall. A higher Recall can be obtained by 
merging communities having edges between their nodes 
(increase in I, m stays constant). A higher Precision can 
be reached, for instance, by splitting sparse communities 
in cliques of size at least 2. Note that the resulting solu¬ 
tions are not necessarily Pareto optimum for the graph, 
it could be possible to find a solution Pareto dominant 
to them, but they are not Pareto dominated by P. 

To put it in a nutshell, the problem of this method is 
that it is too simple to generate irrelevant solutions on 
the Pareto frontier. In fact, this problem is linked to the 
size of the communities found: a partition composed of 
large communities will tend to have a high Recall and 
a low Precision, and vice versa. In the next section, we 
will propose a solution taking into account this problem 
of the size of communities. 


f. Single run versus several runs 

In this first approach, because we wanted to stay as 
close as possible from the classification problem, we used 
a training set and a test set. However, as our problem 
is not really a classification problem, there is no need 


The method presented in the previous section has the 
drawback of not being directly grounded on the tradi¬ 
tional works on community detection. In this section, 
we will propose another approach which is based on a 
generalization of the graph partitioning problem, and we 
will show that this approach is directly related to the 
most widely used quality function in community detec¬ 
tion, Modularity. 


A. Graph partitioning generalization 

The ancestor of community detection is the problem 
of graph partitioning. This problem can be expressed in 
the following manner: for a given network and a given 
number of clusters of similar sizes, the best partition is 
the one that will result in a minimum number of edges 
laying between clusters. It is necessary to fix the number 
and size of the objective clusters, otherwise minimizing 
inter-cluster edges is achieved by a trivial solution, such 
as leaving only the node of lower degree in a community 
and all other nodes in another. As the number of extern 
edges is the opposite of the number of intern edges, a 
typical measure of the quality of a partition for commu¬ 
nities of fixed properties can be unambiguously defined 
as: 


Q partitioning — I 


m — I 


— = Recall 
m 


Which is identical to the Recall defined in the first chap¬ 
ter. However, two partitions can only be compared ac¬ 
cording to this metric if they are composed of clusters of 
similar sizes. The problem of community detection can 
be seen as a generalization of graph partitioning, search¬ 
ing for the best partition not only for fixed properties 
of communities, but for the best solution considering all 
possible combinations of number and size of communi¬ 
ties. 

The two opposite metrics that we propose to use are 
therefore QpartiUoning and an indicator of the size of com¬ 
munities, which corresponds to the difficulty of having 
intra-community edges. The more vertex pairs are inside 
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FIG. 2: Visualization of the ratio of intern edges and size of communities 


communities, the easier to have edges inside communi¬ 
ties. To represent the size of the communities, we want 
to use a metric between 0 and 1, with 0 corresponding 
to the largest communities, for which a maximal value 
of Qpartitioning Can be reached. We use the fraction of 
vertex pairs that fall outside communities: 

cSize = 1- 

P 

We can remark that this metric is related to diver¬ 
sity indexes: small values will correspond to larger, more 
uneven clusters than large values. Said differently, the 
closer we get to a single cluster containing all nodes, the 
less useful information the partition contains about the 
modular structure of the network. 

Fig. i is a visualization of these two metrics on the 
same networks used in the previous section. This time, 
we display all levels of decomposition provided by hier¬ 
archical algorithms. One interesting property is that, if 
the graph was a random one, the proportion of edges in¬ 
side communities will be linearly proportional to the pro¬ 
portion of vertex pairs inside communities. Therefore, 
on a random graph, there is a relation Qpartitioning = 
1 — cSize. We represent this relation as a straight dashed 
line in our graph. 

Using this random case as a baseline allows us to bal¬ 
ance the improvement in Qpartitioning when augmenting 
the size of communities relatively to the mechanical im¬ 
provement due to the higher number of vertex pairs be¬ 
tween communities. By taking the difference between the 
separation produced by the partition and the baseline, we 
can have a measure of the improvement yielded by this 
partition. This function Q' can be defined as: 


Q — Qpartitioning 


m 


This is a partial solution to the problem we encountered 
using the Precision/Recall approach: if we consider Q', 


instead of the raw Qpartitioning value, it is no longer pos¬ 
sible to provide trivial solutions by merging communities 
of an initial partition, as these trivial solutions will come 
closer to the baseline, if they are not relevant, and there¬ 
fore have a lower Q'. 


B. Relation with the Modularity 

Modularity is usually defined as a sum over all edges or 
a sum over communities. However, using our notation, 
it can also be written as: 

I I' 

Modularity = - 

m m 

Where I' corresponds to the expected number of edges 
according to a null model. As a consequence, if the same 
null model is used, Q' is strictly equivalent to the Mod¬ 
ularity. Previous works have shown that a better null 
model consists of a random network of same degree dis¬ 
tribution than the original graph. We can adapt our 
solution to this improvement. 


1. Modularity Decomposition Graph 

Based on the previous observations, we can propose 
a variation of the Qpartitioning and cSize criteria. We 
already proposed to replace Qpartitioning by Q', and we 
have just seen that using the typical Modularity was an 
improvement over it. However, we must also change the 
cSize accordingly. By using the property | ^ in a 

random graph, we can define our modified cSize, that 
we call Corrected Community Size {CCS) as: 

I' 

CCS =1 =1 {Qpartitioning M odulavity) 
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FIG. 3: Visualization of the Modularity Decomposition Graph 


method 

eigen 

fg# 

im# 

louvain# 
-0- wt 








\ 



f) 

method 

• fg# 

im# 

• 

•) 





louvain# 

real 

-0- wt 


< 

•*. 

A . 





0.2 n I I “ I I 

0.900 0.925 0.950 0.975 1.000 

CCS 

FIG. 4: Modularity Decomposition Graph on a 
generated network where the maximum of modularity 
do not match with the researched solution 


The idea of this measure is similar to the cSize, it repre¬ 
sents how big communities are in term of the probability 
of containing links, but this time considering the degrees 
of their nodes. Fig. [^represents the score of partitions 
using the same algorithms and graphs than previously. A 
consequence of using these two metrics to define a Pareto 
frontier is that, if we know the solution of maximal mod¬ 
ularity, it is not possible to find a value on the Pareto 
frontier with a value oi CCS below the value correspond¬ 
ing to the modularity optimum, that is to say, it is not 
possible to find a Pareto optimal solution with ’’larger” 
communities than the solution of maximal modularity. 
As a consequence, in the LFR example, most solutions 
become Pareto-dominated by the correct partition known 
by construction. 

While working with networks generated by the LFR 
benchmark, we were surprised to observe that the cor¬ 


rect solution was always the solution of highest modular¬ 
ity, despite the so-called resolution limit. This is because, 
with the parameters most commonly used in the litera¬ 
ture, the communities are in the correct resolution for 
modularity. To avoid this bias, we generated a network 
with the LFR benchmark using the following parameters: 

• Number of nodes: 50000 

• 0.5 

• Size of communities : [11,11] 

• Average degree: 20 

• Maximal degree: 20 

Note that the LFR benchmark is not fully appropriate 
to generate this kind of large graphs with small, dense 
communities, because of its universal ^ value for each 
node. This is the reason for our choice to generate cliques 
of fixed size, and to set the degrees of nodes accordingly. 
Although unrealistic, this is not a major concern for our 
purpose, as we are just interested in obtaining a clear 
community partition with suboptimal modularity score, 
and not to compare community detection algorithms on 
realistic networks. 

The results of this graph are shown in Fig. [^ We 
can observe that, although both Louvain and InfoMap 
found the correct decomposition, the Louvain method 
also identifies solutions of higher modularity, but of larger 
size. The fast greedy method identifies a solution with 
a Modularity score relatively close to the optimum, but 
very far in term of the size of communities. 

2. Limitations of the Modularity Decomposition Graph 

Gompared to the first multiple-criteria approach that 
we proposed, the Modularity Decomposition has several 
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FIG. 5: Comparison of partitions using Modularity and MDensity 
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advantages: it is directly based on the Modularity, a 
strongly established quality function for partitions, and it 
does not allow to add points on the Pareto front by gener¬ 
ating arbitrarily large communities. Compared to using 
the sole Modularity, it allows differentiating between sub- 
optimal but pertinent partitions -with a value of modu¬ 
larity below the optimum but corresponding to ’’smaller” 
communities- and suboptimal partitions, yielding solu¬ 
tions not on the Pareto frontier. 

However, this method still has some drawbacks: first, 
it is still possible to find trivial solutions by proposing 
arbitrarily small communities. Secondly, as seen on Fig. 
1 ^ this solution might not help us to prefer the correct 
solution over the solution of optimal modularity when 
relevant. Whereas the correct solution is on the Pareto 
frontier, the gain in CCS is of an order of magnitude 
comparable to the loss in the Modularity. 


V. MODULARITY AND MDENSITY 

The Modularity is a method based on the comparison 
of the ratio of extern (or intern) edges to a null model. As 
so, this metric is clearly a descendant of the partitioning 
problem. However, compared to the traditional, infor¬ 
mal definition of communities, which states a trade-off 
between clearly separated and well defined communities, 
it seems that the modularity alone is biased -for sparse 
networks- toward an optimization of the separation of 
communities, at the cost of a poor definition. This phe¬ 
nomenon is a consequence of the limit of resolution of 
the modularity, and should be balanced by a symmetri¬ 
cal measure ensuring the optimization of the density of 
the communities. 

The modularity is a measure of the improvement of the 
separation of communities -the ratio of inter-community 
edges- over a null model. We introduce the Module- 
Density, or MDensity, which is the improvement of the 
density of communities over a null model, pondered by 
the separation. 


A. Introducing MDensity 


We can start by computing the overall improvement in 
modularity over a null model, defined as the sum for each 
community of its gain in density pondered by its relative 
size in terms of number of pairs: 


ICI 

DensityGain = 

C=1 




I-I' 
k 


However this function has trivial maximal solutions, 
communities composed of cliques, we therefore balance 
it by the ratio of intern edges inside communities, : 


MDensity = —^— * — 
k m 


We can then conveniently express the MDensity from 
the Modularity, which gives us another interpretation of 
the MDensity as the modularity balanced by the density. 


M Density 


k 


* Modularity 


However, one of the interesting features of the Mod¬ 
ularity is that it uses a null model based on the preser¬ 
vation of the degree distribution. We want to also in¬ 
tegrate this feature in the MDensity. The intuition is 
that a higher density can be reached more easily between 
nodes of high degrees, and is therefore less significative 
than between nodes of low degrees. We have defined I' as 
the number of edges expected in communities. As edges 
are distributed at random, we can define k' = Mp ^];i 0 
degree-corrected number of vertex pairs inside communi¬ 
ties based on the chosen null model. Our final definition 
of MDensity is now: 


MDensity = ,, * — 

k' m 

To understand the difference in nature between MDen¬ 
sity and Modularity, we can compare what does a maxi¬ 
mal score corresponds to for these two metrics. In both 
cases, a maximal score can be obtained only if the dif¬ 
ference between the number of intern edges and the ex¬ 
pected number of edges according to the null model is 
maximal. If we consider an infinite network with com¬ 
munities of finite size, the number of expected edges will 
tend to zero, maximizing this difference. As the sum of 
this difference is divided by the total number of edges 
for the modularity, any decomposition of an infinite net¬ 
work in connected components of finite size will result 
in a maximal score of Modularity of 1, whatever the 
properties of these connected components. We can imag¬ 
ine them as arbitrarily long chains with arbitrarily large 
cliques at their extremities for instance, which is prob¬ 
ably not what most people will recognize as good com¬ 
munities. On the contrary, for the MDensity to be equal 
to 1 , it is necessary that all communities have a density 
of one. As a consequence, a perfect score of MDensity 
can be reached in an infinite graph only if we can find 
communities defined as cliques without any links between 
them. 

Fig. [^presents the results on the same graphs as pre¬ 
viously, plus the Zachary karate club and a generated 
network with 2 hierarchical levels. We can see that us¬ 
ing these metrics often allows us to eliminate much of 
the proposed partitions, as not Pareto optimal. Even 
though the fast greedy Modularity optimization method 
proposes a complete hierarchy of solutions, none of them 
manage to be on the Pareto frontier. Each network result 
seems relevant: 

• For the LFR benchmark, the correct solution is, as 
expected, on the top right area, only challenged by 
the infomap solution. 
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• For the DicoSyn network, the infomap solution, 
qualitatively identified as more relevant, sees its 
slightly lower Modularity compensated by a large 
MDensity score. The profile of this plot is interest¬ 
ingly similar to the next one. 

• For LFR with suboptimal Modularity solution, the 
researched solution is the only one to have both a 
high Modularity and MDensity. 

• For the Zachary karate club, as the network is 
small, we also represented the partitions obtained 
by the original edge betweenness algorithm by Gir- 
van and Newman [^, identified as eb. The solution 
of maximal modularity found by the louvain algo¬ 
rithm corresponds to the partition in 4 communi¬ 
ties, often considered the most relevant. Despite 
the large quantity of partitions considered in this 
case, this solution is the only Pareto optimal one. 

• For the generated network with hierarchical com¬ 
munities, the two solutions are on the Pareto fron¬ 
tier, one with higher modularity and the other with 
higher MDensity. Different algorithms find one or 
the other of these solutions, illustrating the interest 
of our 2-criteria approach. 

B. Relation to Precision/Recall 

If we step back to our first approach, using Precision 
and Recall, and we compare to the couple Modularity 
and MDensity, we can now observe that there is a rela¬ 
tion between them. The Modularity, as already stated, 
corresponds to the improvement between the observed 
value of Recall and the Recall of a similar partition in a 
random network. The MDensity can be defined as the 
Precision multiplied by the Modularity. 

VI. IDENTIFYING BEST SOLUTIONS 

Contrary to the previous approach, in the generated 
graphs with a solution of suboptimal modularity, the gain 
of the correct solution in MDensity is much higher than 
the gain in Modularity of the other solutions of the Pareto 
frontier. We can use this fact to propose a combination of 
the two criteria in a single quality function. Of course, by 
doing so, we fall again in the problems of single criterion 
metrics described in introduction. As a consequence, we 
prefer to include in this metric a parameter a G [0,1], 
which describes the relative importance we attribute to 
each criterion. Our combined quality function, 2FQ, for 
Two Fold Quality, is defined as: 

2FQ = aModularity * (1 — a)MDensity 

On the network of suboptimal modularity, we can now 
compute that the researched partition has the highest 


value of 2FQ for a G [0,0.965], which means that it 
will be considered as the best solution, unless we choose 
a > 0.965, corresponding to a choice mostly considering 
the separation of the communities, and not their density. 
We have to stress that the intervals of a corresponding 
to maximal 2FQ are only relative to the tested parti¬ 
tions, and that other partitions might exist that would 
completely modify them. The length of the interval also 
has no meaning, unless we know all the solutions on the 
Pareto frontier. 


VII. CONCLUSION 

In this article, we presented a new approach to evalu¬ 
ate and compare partitions in communities. This method 
is grounded on the usual definition of communities, de¬ 
fined as a trade off between the ’’internal definition” and 
the ’’separation” of the communities. It makes use of 
an already widely used metric, the Modularity, and we 
can also relate it to the Precision and Recall approach 
in classification. Because it has two criteria, it allows 
identifying several relevant partitions. However, as both 
criteria do not have trivial solutions, it can drastically 
limit the number of partitions considered as relevant. 

This method opens several possibilities for future work, 
among them: 

• Adapt it to overlapping communities. This is not 
trivial, but some works already exist to adapt mod¬ 
ularity to overlapping communities, such as m- 

• Adapt existing algorithms of Modularity optimiza¬ 
tion for 2FQ optimization, and test the efficiency of 
such a method on simulated and real benchmarks. 

• Explore in details the properties of the Pareto max¬ 
imal partitions, in the case of clearly or weakly de¬ 
fined communities. 

To conclude, we want to stress the importance of compar¬ 
ing several partitions. Community detection is applied 
in many fields and for many purposes, but one popu¬ 
lar usage is to use it on an existing network on which 
one wants to gain insights, and to interpret the parti¬ 
tion found as being an intrinsic property of the studied 
network. For instance, one can study the sizes of the 
communities -or the distribution of their sizes, the num¬ 
ber of inter-community edges, or more generally interpret 
the meaning of such and such nodes being clustered to¬ 
gether. This is a perfectly legitimate practice, however, 
for this usage, each particular community detection algo¬ 
rithm having its own definition of what is a community, it 
appears important to take into consideration several rel¬ 
evant solutions, and to check if the observations we get 
on one partition are confirmed by others. For instance, it 
appears clearly from our observations and from the reso¬ 
lution limit that methods based on modularity optimiza¬ 
tion tend to find large, sparse communities when applied 
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to large networks, while other methods could find a com¬ 
pletely different but relevant solution composed of much 
smaller communities. We hope our multiple-criteria ap¬ 
proach can be used in such cases to consider several parti¬ 


tions, not by applying randomly some algorithms, but by 
picking the most relevant partitions among several ones. 
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